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Little is known about the genetics or genomics of Panax ginseng. In this study, we developed 70 expressed sequence tag- 
derived polymorphic simple sequence repeat markers by trials of 140 primer pairs. All of the 70 markers showed reproducible 
polymorphism among four Panax species and 19 of them were polymorphic in six P. ginseng cultivars. These markers segregated 
1:2:1 manner of Mendelian inheritance in an F 2 population of a cross between two P. ginseng cultivars, 'Yunpoong' and 'Chunpoong', 
indicating that these are reproducible and inheritable mappable markers. A phylogenetic analysis using the genotype data showed 
three distinctive groups: a P. ginseng-P . japonicus clade, P. notoginseng and P. quinquefolius, with similarity coefficients of 0.70. 
P. japonicus was intenningled with P. ginseng cultivars, indicating that both species have similar genetic backgrounds. P. ginseng 
cultivars were subdivided into three minor groups: an independent cultivar 'Chunpoong', a subgroup with three accessions 
including two cultivars, 'Gumpoong' and 'Yunpoong' and one landrace 'Hwangsook' and another subgroup with two accessions 
including one cultivar, 'Gopoong' and one landrace ' Jakyung'. Each primer pair produced 1 to 4 bands, indicating that the ginseng 
genome has a highly replicated paleopolyploid genome structure. 

Keywords: Panax species, Expressed sequence tag-simple sequence repeat, Ginseng cultivars, Genetic diversity, Cultivar 
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INTRODUCTION 

Korean ginseng (Panax ginseng Meyer) is an impor- 
tant medicinal herb belonging to the family Araliaceae. 
Ginseng has been used as oriental medicine for thousands 
of years [1]. The major components showing pharmaco- 
logical effects are the ginsenosides, which are known for 
their beneficial properties to the central nervous system, 
cardiovascular, endocrine and immune systems [2]. 

In ginseng research, medicinal components and their 
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functions have been widely investigated. However, 
breeding, genetic and genomic studies have been rarely 
performed because of difficulty in maintaining plants 
and reproducing progenies. Approximately three to four 
years of growth is necessary to produce a small number 
of seeds, approximately 40 seeds per plant [3], thus hin- 
dering systematic management of genetic materials. Up 
to now, eight elite cultivars, 'Chunpoong', 'Yunpoong', 
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'Gumpoong', 'Gopoong', 'Sunpoong', 'Sunwon', 'Su- 
none' and 'Chungsun', have been bred by pure line se- 
lection and have been registered as commercial varieties 
since 1997 in Korea [4]. Even though the new varieties 
show better yields and qualities compared to that of the 
local landrace which is a native mixed line [3,4], they 
are being cultivated in less than 10% of the total ginseng 
cultivation fields because of the lack of well-organized 
seed production and supplying system. Stable seed sup- 
ply systems with credible authentication method for 
each ginseng cultivar will promote high quality ginseng 
production via improvement of the ginseng breeding and 
seed industry. 

Molecular breeding tools using DNA markers may 
be an alternative and indispensable way for ginseng im- 
provement because marker assisted selection can reduce 
efforts and time for breeding. However, very limited 
numbers of DNA markers were also reported for gin- 
seng. Random DNA markers such as random amplified 
polymorphic DNA [5-7] and amplified fragment length 
polymorphism (AFLP) [8] are used to study the diversi- 
ties of local ginseng collections. However, these random 
primer-based markers could not be shared by common. 
Approximately, 60 of simple sequence repeat (SSR) 
markers have been produced from SSR-enriched librar- 
ies [9,10] and from bacterial artificial chromosome (BAC) 
end sequences [11] and are studied to determine the ge- 
netic diversity of ginseng collections. All of these SSR 
markers were derived from genomic sequences and these 
were not intensively studied between ginseng cultivars. 
Even though several papers described ginseng expressed 
sequence tags (ESTs) [12-15], there have been no reports 
on development of EST-derived SSR markers and their 
utilization in ginseng. 

ESTs providing comprehensive transcript information 
[16] are valuable resources for development of molecu- 
lar markers because they are derived from relatively 
conserved genie regions. EST-derived SSRs are also 
more advantageous than genomic SSRs because of the 
rich public availability of EST sequences and their high 
transferability to related species [17]. Thus, we are go- 
ing to develop large number of EST sequence-derived 
SSR markers and construct a high resolution genetic map 
which can be utilized as a frame for genome sequenc- 
ing. In this study, we tried to develop reproducible EST- 
derived SSR markers which can be applied for mapping 
and assessing a genetic similarity between registered 
commercial inbred varieties. And we estimated a poly- 
ploidy level of ginseng genome based on numbers of 
gene-based polymerase chain reaction (PCR) products. 



MATERIALS AND METHODS 

Plant materials and DNA extraction 

Six P. ginseng accessions, four registered cultivars, 
'Chunpoong', 'Yunpoong', 'Gumpoong' and 'Gopoong', 
bred by inbred line selection in Korea Ginseng Corpora- 
tion (KGC) Natural Resources Research Institute (Dae- 
jeon, Korea), and two representative local landraces, 
'Jakyung', mixed lines with red fruits, and 'Hwangsook', 
mixed lines with yellow fruits, and three related Panax 
species, P. quinquefolius originated in the USA, P. japoni- 
cus originated in Japan, and P. notoginseng originated 
in China, were included in the detennination of genetic 
diversity. DNA pools derived from more than 15 individu- 
als were used to represent each cultivar and landrace of 
P. ginseng. The DNA pool consisted of a mixture of the 
same amount of template DNA from 1 5 individuals of 
each cultivar and landrace of P. ginseng and from five in- 
dividuals of P. quinquefolius. However, single individual 
DNA was used to represent P. japonicus and P. notogin- 
seng because of limited materials. An F 2 population that 
consisted of 5 1 individuals from a cross between 'Yun- 
poong' and 'Chunpoong' was used to determine the inher- 
itability and reproducibility of the newly developed mark- 
ers. All leaf samples were kindly provided from KGC 
Central Research Institute. Total DNA was extracted using 
the modified cetyltrimethylammonium bromide method 
[18]. DNA concentrations were measured using ND-1000 
(NanoDrop Technologies Inc., Wilmington, DE, USA). 

Construction of the ginseng expressed sequence 
tag database and repeat motif screening 

A ginseng EST database was constructed by collecting 
sequences from public databases. After removal of poly- 
A tails using PanGEA [19], repeat-oriented sequences 
and SSR motif-containing sequences were characterized 
from the raw EST database using RepeatMasker ver. 3.2.6 
(http://www.repeatmasker.org) which was downloaded 
and installed on the local computer. In the screening pro- 
cess, a default mode with a "-poly" option was used to 
select genuine SSR motifs. 

Designing primers 

We extracted ESTs which contained 3-6 copies of 
SSR motifs using Tandem Repeat Finder [20]. Primer 
pairs were designed from the flanking sequences of SSR 
motifs (Table 1) with 18 to 27 bp nucleotides using the 
Primer3 program (http://frodo.wi.mit.edu/primer3/). 
Product sizes ranged from 150 to 600 bp. Standalone 
BLAST executables (BLASTN 2.2.15, ftp://ncbi.nlm. 



http://dx.doi.org/1 0.51 42/jgr.201 1 .35.4.399 



400 



Choi et al. EST-SSR Markers in Panax ginseng and Related Species 



Table 1. Simple sequence repeats found in ginseng EST sequences and summarization of SSR marker developments 



Repeat unit 


SSR" 


Primer design^' 


PCR 


Polymorphic between 




(length) 


Success (%) 


Panax ginseng cultivars (%) 


Panax species (%) 


Mono 


855 


11 


4(36.4) 


0 


0 


Di 


379 


28 


23 (82.1) 


9 (39.1) 


16(69.6) 


Tri 


246 


45 


41 (91.1) 


9 (22.0) 


30 (73.2) 


Tetra 


87 


10 


8 (80.0) 


0 


2 (25.0) 


Penta 


70 


12 


12 (100) 


1 (8.3) 


11 (91.7) 


Hexa 


34 


5 


4 (80.0) 


0 


3 (75.0) 


Degenerate 




29 


27 (93.1) 


0 


8 (29.6) 


Total 


1,671 


140 


119(85.0) 


19(16.0*) 


70 (58.8) 



EST, expressed sequence tag; SSR, simple sequence repeat; PCR, polymerase chain reaction. 
1, No. of potentially polymorphic SSRs based on the poly option of the RepeatMasker program. 
2) No. of sites used for primer designing. 



nih.gov/blast/executables) were used to avoid primer 
design in duplicate sequences. Various sources of SSR 
markers reported in the previous papers were designed 
and tested together [9,10]. 

Polymerase chain reaction and electrophoresis 

PCR amplifications were performed in a 25 uL volume 
with 1 U Taq DNA polymerase (Vivagen, Seongnam, 
Korea) according to the manufacturer's protocol using 
a DNA Engine Thermal Cycler (Bio-rad, Hercules, CA, 
USA). Conditions for the PCR cycle were as follows: 5 
min at 95°C for denaturation, 38 cycles of 10 s at 95°C, 
30 s at T m °C, 20 s at 72°C, and 10 min at 72°C for final 
extension. PCR products were separated on 2% agarose 
gels and on 5% denaturing polyacrylamide gels or 9% 
non-denaturing polyacrylamide gels. 

Data analysis 

Blast2GO ver. 2.4.2 was used to annotate polymorphic 
ESTs with default parameters [21]. Analyses of devel- 
oped marker data were conducted using PowerMarker 
ver. 3.25 software [22]. Allele frequency data were 
obtained as a binary matrix and were imported into NT- 
SYSpc 2.1 IX (Exeter Software; Setauket, NY, USA) for 
phylogenetic analysis using Dice's coefficient [23] and 
the unweighted pair group method with the arithmetic 
mean [24]. Bootstrapping of the tree with 1,000 replica- 
tions was generated in Winboot [25]. 

RESULTS AND DISCUSSION 

Simple sequence repeat motif in ginseng ex- 
pressed sequence tag sequences 

A total of ca. 11 Mb, 19,578 ginseng ESTs, were col- 



lected from the public database and reconstructed in our 
local server (http://im-crop.snu.ac.kr). Using an EST 
trimming process, ca. 17.5 kb of poly -A tails were re- 
moved and 1.6% (179,540 bp) of ESTs were masked as 
repeat sequences. Of them, 1,584 bp showed a significant 
homology with 19 kinds of transposable elements, 8,619 
bp had a significant homology with 32 non-coding RNA 
elements, and 62,348 bp derived from 1,344 regions 
were screened as low complexity DNA. A total of 2,158 
regions spanning 108,116 bp were classified as simple 
sequence repeats, and 1,671 sites spanning 68,763 bp 
detected in 1,567 ESTs (8.0% of the total raw ESTs) 
were classified as potential polymorphic SSR sites by the 
"-poly" option (Table 1). Most of the ESTs (94.6%) con- 
tained one SSR motif, with exceptions containing two 
SSRs (4.6%) or three to five SSRs (0.8%). 

Classification based on the nucleotide lengths of the 
repeat motifs showed that 51% of SSRs were composed 
of mono-nucleotide motifs, but mononucleotide repeat 
motifs showed poor success rates for PCR as well as 
polymorphism that may have derived from "stutter" arti- 
facts in the PCR reaction [26]. When we ignored mono- 
nucleotide repeats, di- and tri-nucleotide repeats were 
the most abundant classes, including 46.4% and 30.1%, 
respectively. Similarly, di- nucleotide repeat motifs were 
also predominant in kiwifruit, spruce and coffee [27-29]. 
Meanwhile, tri-repeat motifs were most predominant in 
the EST-SSRs of grape, sugarcane, barley, wheat, maize, 
sorghum, rice, rye, oats, and flax [30-36]. 

Assessment of homogeneities in ginseng landra- 
ces and registered cultivars 

During development of DNA markers polymorphic 
between accessions, we found that DNA markers were 
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Fig. 1. Allelic variations among individuals of a landrace and a cultivar. Denaturing polyacrylamide gel electrophoresis was conducted for sepa- 
ration of PCR products using individual DNA and the GES0002 marker. (A) A total of 15 individuals of landrace 'Jakyung' were surveyed. Differ- 
ent genotypes were denoted as a-f and genotype d is shown as major. (B) A total of 20 individuals of cultivar 'Chunpoong' were surveyed. Only 
number 4 plant denoted by * shows heterozygous allele. L, DNA ladder; GES, ginseng expressed sequence tag-simple sequence repeat. 



not homogeneous among individuals in one local land- 
race accession or in the registered cultivars, as shown in 
Fig. 1 . The most popular landrace 'Jakyung' individuals 
showed up to six different genotypes among 15 individu- 
als, even though all were derived from the same seed lot 
(Fig. 1A). Meanwhile, one of the elite cultivars, 'Chun- 
poong', showed a relatively uniform genotype with two 
off-types among 20 individuals randomly selected from 
the same seed lot (Fig. IB). Screening with three mark- 
ers developed in this study showed a relative range of 
heterogeneity of 10% to 30% in six accessions. Also, 
we found that previously reported polymorphic ginseng 
markers were not reproducible in our trial [9,10,3 1]. This 
may be due to the heterogeneity of the ginseng popula- 
tion because they used an individual plant DNA for rep- 
resentative of each accession. 

Approximately 56% of AFLP bands showed poly- 
morphism among wild P. ginseng individuals in Russian 
Primorye area and population structure study clearly 
differentiated their phylogenetic relationships based on 
frequencies of individual alleles [37]. Similarly, more di- 
vergence detected in wild P. quinquefolius than the culti- 
vated [38] that might be derived by possible out-crossing 
events even though the plants prefer self-fertilization [39]. 
And cultivated P. notoginseng population remained fair 
level of biodiversity, ranged 74% to 39% of divergence 
depends on location in China [40]. Our result showed 



abundant genetic diversity even in cultivating P. ginseng 
landraces (Fig. 1A) that might be derived from bulked 
seed harvesting from genetically unfixed lines and also 
from temporal out-crossing [39]. Meanwhile, eight elite 
cultivars were bred by pure line selection even though 
each showed approximately 10% of off-type allele. 
Therefore, we considered that utilization of a DNA pool 
from many individuals will be credible to identify more 
reproducible markers than using a single plant for rep- 
resenting each Korean ginseng cultivar even though the 
method can ignore many rare alleles. We concluded to 
use a DNA pool derived from 15 individual plants to rep- 
resent each Korean ginseng cultivar because our purpose 
is development of markers which can be a general repre- 
sentative for each elite cultivar (Fig. IB). 

Development of simple sequence repeat markers 
and their transferability to related species 

We designed a total of 140 primer pairs amplifying 
111 SSR motifs and 29 degenerated SSR motifs (Table 
1). Among them, 119 pairs produced bands of which 105 
were the same as the expected sizes and 16 were larger 
than the expected. And 21 primers produced no clear 
PCR product. The PCR failure rate was 15% for the 140 
trials of EST-SSR primers, which is in the 10% to 40% 
range reported in previous EST-SSR analyses for other 
species [27,30,31,35,41-44]. Seventy pairs, correspond- 
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ing to 58.8% of the successful PCR primers, showed 
polymorphism for at least one of nine accessions includ- 
ing six P. ginseng accessions and three related species. 
Polymorphisms were mainly restricted to interspecies, 
and only 19 of 70 SSRs showed polymorphism between 
P. ginseng cultivars as well as between Panax species 
and were named ginseng EST-SSR (GES) (Fig. 2A, B; 
Tables 1 and 2). The other 51 markers were polymor- 
phic only among Panax species. Among those, 43 were 
derived from intact SSR regions and were named Panax 
EST-SSR (Fig. 2C, D; Tables 1 and 2). Eight were de- 
rived from degenerated SSR regions and were named 
Panax EST (Tables 1 and 2). BLASTX analysis revealed 



that 51 of 70 polymorphic ESTs showed significant hits 
and 1 9 did not have any matches with the known pro- 
teins. Among the significant 5 1 hits, 22 showed best hits 
with genes in Vitis vinifera (Table 2). 

The mean level of polymorphism was 20% at the 
species level that is similar to those of previous EST- 
SSR studies in other species [30,35,45-48]. The level of 
polymorphism in EST-SSR is lower than that of genomic 
SSR primers because the transcribed regions are more 
conserved than the non-coding regions [17,29,34,49-52]. 
Our gene-based SSR markers showed comparable levels 
of polymorphisms to those of the genomic SSR mark- 
ers for distinguishing P. ginseng accessions. Only 22 of 



Table 2. Characteristics of 70 polymorphic expressed sequence tag-simple sequence repeat loci in Panax ginseng cultivars and related species 



Marker 



Primer pair ( 5 ' — » 3') 



Repeat T a 



Size 



motif (°C) 

GES0001 GCATGGCAATTTGGAGAGAGGTACG (GAA)n 60 196 

GTTCTGTGACTTGCCGGTTTGCTCC 

GES0002 GTGGTGAGAAAGGGAAAGAGCAATCG (TA)n 60 176 

CCCTCGATCTACAGATGATCAAATAGC 

GES0003 TTTCAAATGGATCTATGAGAATAATGA (TTC)n 54 247 

TGGGCACATAAAAAGACAGTG 

GES0004 CTAGTCAACAACATCATCATCATCC (TA)n 56 213 

TGCAGAATTAACTGAGACTCAAGAA 

GES0005 TTCTTCCTTGCACGTTTCTACTACT (CCA)n 56 190 

AATATAATTGCTACACTCCCCTTGG 

GES0006 AGCCTAGTGTGCAGAAGTAAAGTGT (TA)n 56 238 

TGAAGTAGAACTGATCACAGAGTGC 

GES0007 GGGGCTTCTCTAATTTACACCTTTA (GA)n 55 243 

AAAGATGAAAACTTGATGCTTGTTC 

GES0008 TGGTCTAGAACAGAAAAGATCGAGT (TA)n 55 187 

GTACTGTCTGTGGTTTGAATGATTG 

GES0009 TGAACCACATGATTTACGATTAGTG (TA)n 56 240 

GACATATCTGCATGGCTTTCTTAAT 

GES0010 AGGACTTCAATGCTAGAACTCAGAA (TA)n 56 285 

CATGGGCTAAATAATAAAAGACCAA 

GES0011 GTTATGACCGGTAAATTAGGTTGGT (TA)n 56 229 

CAATCCACATCAAGACCATATTACA 

GES0012 TAATTATATTTGTGTTGGCAGACGA (TC)n 54 227 

CTCGGCATACAACATTTAACTTACC 

GES0013 ATTAGTAGAACGTACAGCCCAAACC (CCA)n 56 230 

TATGGTAACTTTAGGCTGGTGTAGC 

GES0014 GAGAAAATTGAGGAACCAAACAAG (CTA)n 54 299 

GTTTTCTCCACAACTACTGGCTCT 



N r 21 N a 31 MAF GD PIC N b 4 > Sequence description 

14 7 0.3333 0.8148 0.7938 3 No hits found 

22 7 0.2222 0.8395 0.8194 2 No hits found 

20 5 0.4444 0.7160 0.6773 3 No hits found 

27 5 0.3333 0.7407 0.6987 4 Sugar isomerase 



Min. 
E-value 



l.E-12 



14 7 0.2222 0.8395 0.8194 2 at5gl4920 f2gl4_40 2.E-28 



26 4 0.4444 0.6667 0.6072 3 Protein 2.E-15 



16 4 0.4444 0.6914 0.6401 3 Protein 9.E-41 



28 5 0.4444 0.7160 0.6773 2 No hits found 



11 6 0.3333 0.7901 0.7615 2 X y lo S lueaI j a. E _i 3 

endotransglycosylase 



17 4 0.4444 0.6914 0.6401 2 Mitochondrial ribosomal 

protein 11 1 



11 4 0.5556 0.6173 0.5688 2 Protein 



2.E-46 



15 5 0.4444 0.7160 0.6773 3 Predicted, hypothetical protein 4 E n 

[Vitis vimfera\ 



5 0.4444 0.7160 0.6773 2 Carboxyphosphonoenolpyru- 
vate mutase 



12 4 0.6667 0.5185 0.4847 4 No hits found 
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Table 2. (Continued) 



Marker 


Primer pair (5' — ► 3') 


Repeat 
motif 


(°C) 


Size" 


N r 2) 


N, 3 ' 


MAF 


GD 


PIC 


N b 4 ' 


Sequence description 


Min. 
E-value 


GES0015 AAAATTCTGCTCACACTCTCTCTGT 


(CTA)n 


56 


193 


10 


4 


0.5556 


0.6173 


0.5688 


2 


No hits found 






CGGAGTTTTTGAAGATAAGAATCAA 
























GES00 1 6 ATTTATATATCTTCACGCTGCTTCG 


(TCC)n 


56 


230 


10 


4 


0.4444 


0.6667 


0.6072 


3 


Beta-galactosidase a- 
peptide 


5.E-13 




CAAAAATAAGAGATGGAGATGGAGA 






















GES0017 AAAATGGTTCCAAATTGTGCTTC 


(TTA)n 


56 


239 


11 


5 


0.3333 


0.7654 


0.7279 


4 


No hits found 






AAGGTGGAAATAAGGAGAGAAAAGA 
























GES0018 


CTCTCTCTCTCTCTCTCTCATCTGC 
AAAGAAGAACCACAAACACTAAACG 


(TTC)n 


56 


170 


8 


4 


0.6667 


0.5185 


0.4847 


2 


Protein 


l.E-46 


GES0019 GTACTATGGATAAAGCTGGAATGGA 


(TAGGG)n 


56 


207 


6 


5 


0.3333 


0.7654 


0.7279 


2 


No hits found 






CGGTAAGTGACACTAAGAACAACTG 
























PES0001 


GGAGCAGCAATAGACCAAGG 
TTGTTTGGAAACCTGGGAAC 


(CCCTG)n 


55 


356 


4 


2 


0.7778 


0.3457 


0.2859 


3 


Predicted: hypothetical 
protein [ V. vinifera] 


l.E-04 




Trr.r. a nrr: ap.a ac,a at.a citc 
CGTCTTCATCATCCTGAGCA 


(ATG)n 


56 


290 


9 


3 


0.7778 


0.3704 


0.3402 


2 


Nascent polypeptide 
associated complex alpha 
chain 


l.E-68 


PES0003 


GGTGGAGATCACAAGGAAGG 
TGGCAACAATCAGCATCCTA 


(GAA)n 


56 


312 


8 


3 


0.7778 


0.3704 


0.3402 


2 


No hits found 




PES0004 


CGAAGGTGCACCAAAAGTCT 
GGACGAAGACGTGGCTCTAC 


(CACCAT)n 


56 


365 


5 


2 


0.7778 


0.3457 


0.2859 


4 


No hits found 




PES0005 


TGGGTTCAACTTTGGAGGAG 
CTCTTTCACCGCAACAGACA 


(CAGGT)n 


56 


243 


11 


3 


0.7778 


0.3704 


0.3402 


2 


Protein 


5.E-15 


PES0006 


CAACCTTTTAATTCCTTTGCTACA 
CCGTCTCAATATTCACACTGATCT 


(CAT)n 


54 


172 


10 


2 


0.7778 


0.3457 


0.2859 


2 


Transcription factor gt-3a 


l.E-08 


PES0007 


CGAGGAGTCAAAGGTGGAAG 
CGCCTGGAAGTTTTTCTTTG 


(GAA)n 


56 


266 


15 


2 


0.7778 


0.3457 


0.2859 


2 


Dehydrin 


l.E-23 


PES0008 


AACGTGATGCATGTCGAGAG 
GCACCGAGTTTTCCCAAGTA 


(TA)n 


54 


176 


26 


3 


0.7778 


0.3704 


0.3402 


3 


Catalase 


l.E-24 


PES0009 


GGAGGCCCGACTTACCTACT 
CACGTTGACGTGGCTATCTG 


(GGC)n 


54 


213 


6 


2 


0.8889 


0.1975 


0.1780 


1 


No hits found 




PES0010 


GTCTCGCAAAGAATGTCAGC 
CTGCTTTTGCACCTCATAGC 




cc 


1 89 


7 


2 


0 7778 


0 3457 


0 2859 


2 


gl-like protein 


2 E-47 


PES0011 


TATCCACCAAAACACTACTCATCCT 
CCTCTTAGACTCGTCATTAGGTTCA 


(ATG)n 


54 


258 


7 


2 


0.8889 


0.1975 


0.1780 


1 


Predicted: hypothetical 
protein [V. vinifera] 


2.E-05 


PES0012 


ATTTAGCTTGGCTATATGTGAATGG 
GGACAGAAGTGAAGCATTTCATAGT 


(CAG)n 


54 


284 


6 


3 


0.7778 


0.3704 


0.3402 


2 


No hits found 




PES0013 


TCCTAAATTAGCACTAAACGCACAT 
TTGTTTACTAAATTCATGGGAGAGG 


(CAG)n 


54 


162 


g 


3 


0.7778 


0.3704 


0.3402 


1 


Dna binding 


l.E-37 


PES0014 


CAACTGCAAAGTCAAAATAATACGA 
GTAATCTTCCAGCTATCAAAGACCA 


(TA)n 


56 


180 


15 


3 


0.7778 


0.3704 


0.3402 


2 


Myb-like transcription 
factor 1 


5.E-21 
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Table 2. (Continued) 



Marker 


Primer pair (5 ' — > 3 ') 


Repeat 
motif 


(°C) 


Size" 




N, 3 


MAF 


GD 


PIC 


N b 4 ' 


Sequence description 


Min. 
E-value 


PES0015 


ACAAGAACAATTGTCAAAGGAAGTC 
CTTTCAACACCTGAGATGAATCAGT 


(TA)n 


56 


300 


12 


3 


0.7778 


0.3704 


0.3402 


1 


Ribulose bisphosphate 

cm n 1 1 ciiniinit 

>l I kill 3LLULllJ.il 


l.E-62 


PES0016 


GAATGATCATATATACCTCCACTGGT 
TAAAATAAGCATTAGCAGAGCCATC 


(TA)n 


54 


252 


11 


3 


0.7778 


0.3704 


0.3402 


2 


Dehydrin 7 


2.E-50 


PES0017 


TCGGTAAGGATATCATCAACAAAAT 
TTTTTGATAAAGACAAGGTCAAAGC 


(TC)n 


54 


174 


10 


3 


0.7778 


0.3704 


0.3402 


2 


Sterol carrier protein 
2-like 


2.E-18 


PES0018 


GGTATTGCTCGTGAACTTTGTAACT 
CAATAGGAAGAGAAGAAAACCAACA 


(TC)n 


54 


201 


20 


3 


0.7778 


0.3704 


0.3402 


2 


Aspartyl protease 


8.E-107 


PES0019 


GGAAACAGGGGTAGAAGAAGTGTAT 
AGTATTTGTTGTTCCTTTCCTGGAT 


(TC)n 


56 


287 


10 


3 


0.7778 


0.3704 


0.3402 


4 


Dcnl-like protein 4 


2.E-70 


PES0020 


CTATACCTCAGCACCAGTTTCAACA 
TATCTGCGAATTATTTTCCATGAAT 


(CAG)n 


54 


298 


6 


3 


0.7778 


0.3704 


0.3402 


2 


S-rnase-binding protein 


9.E-63 


PES0021 


GAAAACATTTGTGTTTCAGTAGGC 
AATGAGCTTCAGGTAAATATCATCG 


(CAG)n 


56 


155 


10 


3 


0.7778 


0.3704 


0.3402 


2 


60s ribosomal protein 


5.E-92 


PES0022 


CCAAGCCACATAAATCTAGGAGTATC 
AAAAACAACAAGTGCAGTTACACAA 


(CAG)n 


56 


154 


7 


3 


0.7778 


0.3704 


0.3402 


2 


No hits found 




PES0023 


CACAGTGAGGAAGAAGAAGAAGAAG 
ACCTGGAATACTTTCCAATACCG 


(CAT)n 


56 


152 


8 


2 


0.7778 


0.3457 


0.2859 


3 


60s ribosomal protein 16 


l.E-54 


PES0024 


TAATAATAATTTGATGCGGTTCCAT 
GGTGTTGTCAATTAGGAGAGAGAAA 


(CAT)n 


56 


172 


10 


2 


0.7778 


0.3457 


0.2859 


2 


Protein 


5.E-24 


PES0025 


AAAATCAATCTCCCATAAATTTGGT 
TGATGTTTTGAAACAGAATCTTCAA 


(CTG)n 


56 


219 


8 


3 


0.7778 


0.3704 


0.3402 


2 


Serine threonine-protein 
kinase 


l.E-13 


PES0026 


AGAATTTGAAGATGATGAAGAATCG 
AAAAGCTTTAGCCAAAGAAAGAGAG 


(GAA)n 


56 


214 


9 


3 


0.7778 


0.3704 


0.3402 


2 


Protein 


9.E-81 


PES0027 


ACTTTTATCCCAAAGCATCTTTTCT 
GTTCTACCTCTGAATTGGCACTAGA 


(GAA)n 


55 


181 


7 


3 


0.7778 


0.3704 


0.3402 


1 


No hits found 




PES0028 


GATACCTCAACAAAAATCCATCAAC 
ATGACGTTGTGCTTTTATAGCTTCT 


(GGA)n 


55 


154 


9 


3 


0.7778 


0.3704 


0.3402 


3 


Protein 


3.E-70 


PES0029 


TACTCCTTCCCCATTTAATTTCTTC 
GAGGGCAGTTTTGGTAGATATTTTT 


(TCC)n 


56 


183 


8 


3 


0.7778 


0.3704 


0.3402 


2 


Ap2 erf domain-contain- 
ing transcription factor 


9.E-29 


PES0030 


AACCTAATGTCCGGAAGGTAGTAAC 
CAATGTATGGAGGAAGGTTTATTTG 


(TTA)n 


56 


259 


9 


3 


0.7778 


0.3704 


0.3402 


3 


Receptor protein 


2.E-42 


PES0031 


CGAATTCTGATTCTTGACATTTCTT 
CGAAATATGGAGTAAGACGCTGTAT 


(TTC)n 


55 


210 


7 


3 


0.7778 


0.3704 


0.3402 


4 


Protein aq_1857 


7.E-28 


PES0032 


GTGAAGCTTGAACACTTAGAAGAGG 
TCATCATACTTTGCTAACACGTCAC 


(TCCC)n 


55 


165 


6 


3 


0.7778 


0.3704 


0.3402 


3 


Oligosaccharyl transfer- 
ase 


3.E-123 


PES0033 


AAAACGAAGAAAATGTTACAAGTGC 
GTATCACCACAACATCAAATTACCA 


(TCTA)n 


56 


298 


6 


3 


0.7778 


0.3704 


0.3402 


2 


No hits found 
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Table 2. (Continued) 

Marker Primer pair (5' —> 3') Repeat 3,, Size 1 ' N. 2) 
F v ' motif (°C) '_ 

PES0034 GCTTATCTTGCCTGATAATGTCCTA (ATCTG)n 56 259 5 

CAACATGTAATTACGTCTCTCATGC 
PES0035 CAATTTCTTCCGTATAAACCAATTT (AATCA)n 56 5 

ACTCCTACCTGCACAATTTGAATAC 
PES0036 AGTGGCAACTCTAGAGAAACTATGC (CACCC)n 55 259 5 

ATACGTTACATGGCAGCTGAATACT 
PES0037 CAAAAGGCTTGCTCTATATTGTGAT (CAGGC)n 56 275 4 

TCCAGTCATTAATACTTGCAACAAA 
PES0038 TCTCCTTCTGCTGGTAGTAAAAATG (CCGTA)n 56 190 4 

ATAAATTCTGCATCTACGGTGTAGC 
PES0039 TGATCCAGTTCAATTTCTTGTTTTC (GTTTG)n 56 170 4 

ATAATCTTTCAATTTCCCGTACACA 
PES0040 ATTGTTGAAGAAATTGGTTGTTTGT (TTTTC)n 56 270 6 

CTGAGTTCATTTCCTGGAACATAGT 
PES0041 ACTGTTTATGACTGGCTCTACAGTG (TTTTG)n 56 282 9 

GTACCGTCCATGACATAACATAACA 
PES0042 TAAATTCACTTTGTGTGTGTGTGTG (GGGAGA)n 56 229 5 

GTTCTTGATCGTGATTCTTTTCAAG 
PES0043 GGGAGACTAATTTTCTTTGCTTTTC (TTCTCC)n 56 224 3 

TTCTGATGAGTTCTGTGTAGGAGTG 
PE0001 GCCCTAGCCCTAATCAATCC 

GGGCCAATGACCTTATACCC 
PE0002 GATCTCGAACCGACGAACTC 

AACCATACTGCCAACAATTAAGC 
PE0003 GCCTTCTGAACTTCCTGGTG 

GTCAGGTTCTGCAGGTGGA 
PE0004 GGTTCTCGGGACAATGAAAG 

ACCCCATTCCCTTCTCTCAC 
PE0005 GCAACATCACCGTCAATGAG 

CACAAATTTACCAGCCACCA 
PE0006 TCCTCTGCCACATTTAAGCA 

TCATGTTGCAAGAGCAAAGC 
PE0007 GTGGAAGAGGCAAAACCAAG 

AGCCATGCTAGGTCTGTTGG 
PE0008 GGTCTTGGTCTTGGAGTTGG 

CCTCCTTGATTTCCACCTGA 



N„ 3 > MAF GD PIC N/' Sequence description 
3 0.7778 0.3704 0.3402 2 Protein 



Min. 
E-value 

l.E-59 



2 0.7778 0.3457 0.2859 1 Predicted protein [Popa/^ 

trichocarpa\ 



3 0.7778 0.3704 0.3402 1 Conserved hypothetical , E _ 22 



protein [Ricinus communis] 



3 0.7778 0.3704 0.3402 2 ^-related small gtp- l.E-93 



binding protein 



3 0.7778 0.3704 0.3402 2 Unnamed protein product JJM8 



[Vitis vinifera] 



3 0.7778 0.3704 0.3402 2 No hits found 



3 0.7778 0.3704 0.3402 3 Phosphate phosphoenolpyr- 1JMfi 



uvate translocator precursor 



3 0.7778 0.3704 0.3402 1 Auxin-repressed protein 2.E-29 



3 0.7778 0.3704 0.3402 2 O-methyltransferase-like fi ^ 

protein 



3 0.7778 0.3704 0.3402 2 F-box family protein 2.E-29 



55 314 - 2 0.7778 0.3457 0.2859 3 At4g30930-like protein 9.E-25 



54 376 - 3 0.6667 0.4938 0.4377 1 No hits found 



54 154 



54 219 



56 264 



2 0.7778 0.3457 0.2859 1 Eukaryotic translation initia- 

tion factor 3 



2 0.7778 0.3457 0.2859 2 Unnamed protein product 

[yitis vinijera\ 



3 0.6667 0.4938 0.4377 1 Glycine-rich ma-binding 

protein 



55 309 - 2 0.7778 0.3457 0.2859 2 No hits found 



56 210 - 3 0.7778 0.3704 0.3402 2 Nucleic acid binding 8.E-44 



54 221 



2 0.6667 0.4444 0.3457 1 No hits found 



Average 



3.3143 0.6810 0.4568 0.4176 2.2 



MAF, major allele frequency; GD, genetic diversity; PIC, polymorphic information content; GES, ginseng expressed sequence tag-simple se- 
quence repeat; PES, Panax expressed sequence tag-simple sequence repeat; PE, Panax expressed sequence tag. 
1, Expected amplicon size; 2, No. of repeats; 3) No. of alleles; 4, No. of bands around the expected size in P. ginseng. 



189 (representing 11.6%) [10] and 11 of 94 genomic ginseng accessions when the primers were designed us- 
SSR primers (11.7%) [9] were polymorphic among P. ing a microsatellite-enriched library. Meanwhile, 12 out 
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of 3 1 BAC end sequence-derived genomic SSR primers 
(38.7%) were polymorphic among P. ginseng accessions 
[11,53]. 

Even though not all of the nucleotide repeat units were 
surveyed by PCR, our trials showed that big differences 
in the rates of successful PCR and the appearances of 
polymorphisms depended on the repeat unit length. SSRs 
with penta- and tri-nucleotide repeat motifs showed 
the highest degrees of PCR success and polymorphism 
detection between Panax species. SSRs having a di- 
nucleotide motif showed the highest polymorphism rates 
among the ginseng cultivars. Meanwhile, SSRs derived 
from mono-nucleotide polymers were not optimal for 
PCR amplification or polymorphism detection (Table 1). 

Transferability among related species is considered the 
most important feature for EST-SSR markers that help 
to produce conserved orthologous markers and thus be 
applicable to related species which have little genomic 
information [17]. In this study, primers designed from P. 
ginseng ESTs were successfully used in the related spe- 
cies of P. japonicus, P. quinquefolius and P. notoginseng 
with 100%, 97.1%, and 75.7% transferabilities, respec- 
tively, that is similar to the previous studies shown 100% 
transferability between P. ginseng and P. quinquefolius 
[9,11]. 

Number of bands and estimation of the polyploidy 
level in Panax ginseng 

Recent progress in the field of genomics has uncov- 
ered highly replicated polyploidy levels in most of the 



plant genome [54,55]. The P. ginseng genome is consid- 
ered as tetraploid because of the chromosome number 
variations, 12 vs. 24 pairs [56]. However, there has been 
no molecular evidence to detennine their ploidy level or 
to identify a polyploidization event in the Panax species. 
Polyploidy levels were previously studied in the olive 
complex (Olea europaea) based on the band numbers 
of highly polymorphic SSR markers [57]. Various allele 
numbers were detected in various subspecies, and maxi- 
mums of four and six alleles were detected in terra- and 
hexaploid subspecies, respectively, that were consistent 
with those of the flow cytometry analyses. 

To estimate copy numbers of homologous genes 
and thus assume the polyploidy level, we counted the 
number of bands around the expected size in four rela- 
tively homogeneous inbred cultivars by assuming that 
different bands may have been derived from recently 
duplicated paralogous genes. Electrophoresis of the 
PCR products revealed various band patterns around 
the expected sizes. Out of the 119 successful SSR 
primer pairs, only 1 7 pairs yielded one specific target 
band, as shown in Fig. 2C, and 49, 22, and 1 0 pairs pro- 
duced two, three, and four bands, respectively (Table 2 
and Fig. 2). The other 2 1 pairs yielded unspecific faint 
bands. Overall, the data indicate that over 85% of the 
genes remained as duplicate genes with one to three ex- 
tra paralogous gene copies, thus indicating that ginseng 
has a highly replicated polyploidy level which may 
range from a tetra- to octa-paleoploidy genome. These 
results are similar to or greater than the polyploidy level 



123456789L 



B 



L1 23456789L 





100 bp 



Fig. 2. Non-denaturing polyacrylamide gel electrophoresis of PCR products from nine Panax accessions using different markers, (A) GES0003, 
(B) GES0019, (C) PES0010 and (D) PES0034. Lanes 1-6 indicate P. ginseng accessions, Gumpoong (1), Hwangsook (2), Yunpoong (3), Chun- 
poong (4), Gopoong (5), Jakyung (6) and related Panax species, P. japonicus (7), P. guinguefolius (8), P. notoginseng (9). L, DNA ladder; GES, 
ginseng expressed sequence tag-simple sequence repeat; PES, Panax expressed sequence tag-simple sequence repeat. 
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suggested to be a natural tetraploid [56,58]. 

Different band numbers were detected in two landrace 
accessions because of their heterogeneity among indi- 
viduals, as shown in Fig. 1A. One major polymorphic 
band was detected in four ginseng cultivars, but two 
clear bands were observed in two landraces, Hwangsook 
(2) and Jakyung (6), presumed to be derived from differ- 
ent alleles in two groups of individuals in the landraces 
because we used a DNApool derived from 15 individual 
plants (Fig. 2B). Differences in band numbers were also 
detected in different species, such as in lane 7 in Fig. 2B 
and lane 8 in Fig. 2C, that may have been derived from 
a difference in gene copy numbers. One clear band was 
amplified in most accessions, but two bands were pro- 
duced in a single P. japonicus plant and a P. quinquefo- 
lius DNA pool derived from five individuals, indicating 
that the species are heterozygous allele or included an 
extra paralogous gene. 



Phylogenetic analysis of ginseng cultivars and re- 
lated species 

Because most amplicons showed multi-band profiles, 
genotyping was limited only to the major bands which 
appeared around the expected size. Variations in ampli- 
con sizes were manipulated as unweighted and indepen- 
dent characteristics. Major allele frequencies were in the 
range of 0.2222 to 0.8889, with an average of 0.6810. 
The number of alleles was in the range of two to seven, 
with an average of 3.3143. Gene diversity and polymor- 
phism information content ranged from 0.1975 to 0.8395 
(average, 0.4568) and 0.1780 to 0.8194 (average, 0.4176), 
respectively (Table 2). 

A phylogenetic analysis of the nine accessions was 
conducted using 215 allelic data points produced from 
70 markers. Three clades were separated at similarity 
coefficients of 0.7: P. ginseng-P. japonicus clades, P. no- 
toginseng and P. quinquefolius (Table 3 and Fig. 3). It is 



Table 3. Dice's similarity coefficient matrix for nine accessions obtained from 70 marker data 





Chunpoong 11 


Yunpoong" 


Gumpoong" 


Gopoong" 


Hwangsook" 


Jakyung" 


P. japonicus 


P. quinquefolius 


P. notoginseng 


Chunpoong* 


1.0000 


















Yunpoong* 


0.7571 


1.0000 
















Geumpoong* 


0.7571 


0.8429 


1.0000 














Gopoong* 


0.8000 


0.8286 


0.8000 


1.0000 












Hwangsookjong* 


0.7857 


0.8571 


0.8429 


0.8000 


1.0000 










Jakyungjong* 


0.8143 


0.8714 


0.7857 


0.8714 


0.8571 


1.0000 








P. japonicus 


0.7286 


0.8143 


0.8286 


0.7857 


0.8000 


0.7857 


1.0000 






P. quinquefolium 


0.0286 


0.0143 


0.0143 


0.0143 


0.0143 


0.0143 


0.0429 


1.0000 




P. notoginseng 


0.0143 


0.0143 


0.0143 


0.0143 


0.0143 


0.0143 


0.0429 


0.2143 


1.0000 


^Panax ginseng cultivars or accessions. 



— P. ginseng 'Chunpoong' 
P. ginseng 'Yunpoong' 

<- P. ginseng 'Hwangsook' 
— P. ginseng 'Geumpoong' 
P. ginseng 'Gopoong' 

P. ginseng 'Jakyung' 

— P. japonicus 



1000 1 ! P. quinquefolius 
1 P. notoginseng 

0.02 0.23 0.45 0.66 0.87 

Dice's similarity coefficient 

Fig. 3. Dendrogram of the nine Panax accessions, six P. ginseng accessions and three relative species. Phylogenetic tree was constructed 
based on the genotypes of 70 markers using unweighted pair group method with the arithmetic mean clustering analysis. Bootstrap values were 
calculated by 1 ,000 replications and only significant values were denoted on the branches. 
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F 2 population (51 individuals) of Y x C 



LYCCHHHCHHYYHYHYHHHHHHYHCCYCHYYYHYCYHHYCCHYHYHHYHCHYHHL 



300 bp 
200 bp 




300 bp 



100 bp 



Fig. 4. Segregation of the polymorphic marker GES0019 in a F 2 population between Yunpoong and Chunpoong. polymerase chain reaction 
products were separated by non-denaturing polyacrylamide gel electrophoresis. Lanes and L indicate DNA ladder; 1, Yunpoong; 2, Chunpoong. 
F 2 population includes 51 F 2 individuals. Y, C, and H indicate genotype of each F 2 individual which is same with Yunpoong, Chunpoong, and Yun- 
poong/Chunpoong heterozygote, respectively. GES, ginseng expressed sequence tag-simple sequence repeat. 



notable that P. japonicus clustered among the P. ginseng 
accessions with high similarity coefficients which aver- 
aged 0.7905, and 94.3% of the alleles of P. japonicus 
were observed in six P. ginseng accessions. Similarity 
coefficients among P. ginseng accessions ranged from 
0.7571 to 0.8714 (Table 3). P. notoginseng and P. quin- 
quefolius were divided with P. ginseng-P. japonicus clade 
with similarity coefficients of 0.0209 and 0.0207, respec- 
tively. The similarity coefficient between P. notoginseng 
and P. quinquefolius was 0.2149 which is higher than 
their value with P. ginseng-P. japonicus clade. The coeffi- 
cient value is not coincided with the transferability value 
of each marker. Even though 97. 1% of markers were am- 
plified in P. quinquefolius, only 75.7% were amplified in 
P. notoginseng that indicate P. ginseng-P. japonicus clade 
is much closer to P. quinquefolius than to P. notoginseng. 
The biased data might be derived from genotype scoring 
method because only band appearance was counted and 
non-amplification was treated as missing. Sequence level 
analyses will clearly show the phylogenetic relationships 
of the species such as several studies based on conserved 
DNA sequences such as internal transcribed spacer se- 
quences [59-62] and chloroplast DNA [60,63,64]. 

Reproducibility and utility of the markers 

Most of the markers reported in the ginseng genome 
were limited to identification of individuals instead of 
authentication of cultivars or accessions because of the 
difficulty of genetic studies and the limited utility of pure 
inbred lines. Therefore, no inheritance study has yet been 
reported in ginseng. Our purpose was to develop stable 
and reproducible polymorphic markers which can dis- 
criminate elite cultivars and can be used for genetic map- 
ping. Therefore, we have selected polymorphic markers 
between DNA pools of 1 5 individual plants for repre- 
senting each accession to identify major polymorphic 
markers by overcoming the heterogeneity. Furthermore, 
to determine stable and reproducible inheritance of the 
markers, we analyzed seven of the GES markers against 
51 F 2 individuals resulting from a cross between 'Yun- 



Table 4. Goodness-of-fit analysis for seven markers in a F 2 popula- 
tion between a cross of 'Yunpoong' x 'Chunpoong' 



Observed value 

Marker x 2-va ' ue p-value 

Yunpoong Heterozygote Chunpoong 



GES0003 


10 


30 


11 


1.63 


0.44 


GES0010 


11 


28 


12 


0.53 


0.77 


GES0013 


18 


24 


9 


3.35 


0.19 


GES0014 


14 


24 


13 


0.22 


0.90 


GES0015 


18 


26 


7 


4.76 


0.09 


GES0018 


12 


23 


16 


1.12 


0.57 


GES0019 


16 


26 


9 


1.94 


0.38 



GES, ginseng expressed sequence tag-simple sequence repeat. 



poong' and 'Chunpoong' which were most diverse elite 
ginseng cultivars (Fig. 4). All the markers segregated 
with a good fit to the Mendelian 1:2:1 ratio for the geno- 
type of Yunpoong homozygote: heterozygote :Chunpoo 
ng homozygote (Table 4 and Fig. 4) indicating that these 
inheritable and reproducible markers can be utilized for 
discrimination of each cultivar and for mapping using the 
F 2 population. 

There is no report on availability of segregating popu- 
lation in P. ginseng because of lack of pure inbred lines 
and genetic study. Both parental lines, 'Yunpoong' and 
'Chunpoong', showed relatively high homogeneous 
genotypes with less than 10% of off-type alleles. Fur- 
thermore, both cultivars show distinct agricultural char- 
acteristics such as stem numbers, root shapes, disease 
durability and fruit colors. 'Yunpoong' is known as a 
best cultivar for high yield of roots with vigorous growth 
and 'Chunpoong' is known as a best cultivar for red gin- 
seng processing [4]. We have identified 14 polymorphic 
markers between these two cultivars and seven of them 
showed clear mappable genotype scores for each F 2 
individual. Recently, we had obtained a large scale tran- 
scriptome sequence data from both parental lines using 
Roche GS FLX Titanium platform and 50 Gbp of whole 
genome sequencing data from one of the parental line, 
'Chunpoong', using Illumina Genome Analyzer II plat- 
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form [65]. Application of fast evolving next generation 
sequencing technology and the utility of the mapping 
population may promise acceleration of high density ge- 
netic mapping and complete genome sequencing of the 
mysterious medicinal plant, ginseng. 
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